Find Home

Xu Yang

11/22/2017

Project Purpose

There are many reasons for people to change a place to live.Chaning job,changing children’s school,company relocation,seeking quietness or looking for more energetic life,etc..Whatever you just need a reason to move.Question is how reasonable your reason is?

A reasonable reason is not like “I just wanna move but I don’t know where and how to go”.It should be a plan,a solution convincing you even your family that you really have a right choice.

But how to prove a choice is a right choice? Sometimes you thought you already made a right decision,but serveral years or months or days later,you realized it was a big mistake and you had to do it again.In some case one more time means lot lot of time and money!

This project is to make an analysis for the purpose.

Data Source

To do analysis,we need data.Here,we will focus on schools and jobs.Since I have moved into Washington State without any data analysis,then I will firstly do this on Washington State.

Public Schools and Private Schools

https://www.niche.com/k12/search/best-schools/s/washington/

We can scrape public schools list and private schools list from niche.com using library(‘rvest’) and SelectorGadget(http://selectorgadget.com/).The reason to use data from niche is that we can use niche’s school ranking information.There are other ranking sources such as US News,or you can use other source you trust more.

Job Market Data

Employment Security Deapartment of Washington State: https://esd.wa.gov/labormarketinfo/occupations

I choose using “Occupational employment and wage estimates for 2017”.“Monthly employment report” is good,but there is just PDF files instead of files with format as Excel or CSV.Another reason is the former has historical data files.

How to connect schools and jobs?

Each shcool has address and five digits ZIP code.Occupational employment and wage estimates caculated by Core Based Statistical Area (CBSA),which is a U.S. geographic area defined by the Office of Management and Budget (OMB). Then, we need to make a connection between ZIP code and CBSA.

There is a good place to find the connection: - Missouri Census Data Center(http://mcdc2.missouri.edu/websas/geocorr2k.html)

But in the school lists from niche,we don’t have school address. If we want to have each school’s address we need to do a second-layer-scraping for each school. That’s too complicated.

Here we can find another schools’ list with address but without ranking: - OSPI website of Washington State: http://www.k12.wa.us/default.aspx

CBSA,MSA,μSAs

CBSA includes metropolitan statistical area (MSA) and micropolitan statistical areas (μSAs).

Sometimes two or more areas (MSA or μSAs) are in one CBSA.

To make things simple I equally use MSA and μSAs as same kind of Area,and then we can compare them on the same map.

Yes, Area , that’s the purpose of this project: choosing an area to live and make our life better.

A good way to watch areas: Map

We have school lists,school ZIP codes,ZIP_CBSA connection.

Now we still need maps data of CBSA.Thus we can view schools on map.

US Census Bureau has nationwide CBSA maps data (https://www.census.gov/geo/maps-data/data/tiger-line.html).

R has a good tool to get data from USCB which is library(tigris).

We can use core_based_statistical_areas() function and metro_divisions() function to get all MSA and μSAs’ maps data, and combine them together.

cb <- core_based_statistical_areas(cb = TRUE)

md<-metro_divisions()

Then we use geo_join() function to bind school data and area map data, and use leaflet to do mapping. Note: Leaflet is a open-source JavaScript libraries for interactive maps.

Now, we can do mapping for public and private schools!

Public School on MSA and μSAs

Here we use different colors for different areas according the highest rank of schools in each area because a common thinking: Which one is the best school in this area ? Clicking on a area you can view a list of public schools in this area ordered by school’s rank.

From the plot, we can see Seattle-Bellevue-Everett and Olimpia-Tumwater have quite more public schools than other areas.

Between them,Olimpia-Tumwater has more public shools,and Seattle-Bellevue-Everett has more top ten public schools.

To simplize this report,we can made abbreviations for name of each area.

For example we can name Seattle-Bellevue-Everett as A_Seattle, Portland-Vancouver-Hillsborough as A_Portland.

A_ means Area_.

We sill can see the full names on the plots.

Plots: Private Schools on MSA and μSAs

A_Seattle has most top ranking private schools.

A_Seattle also has most private schools.but we can not have just one candidate, A_Tacoma area is also a good choice.

On the map for private schools,the schools’ count is far less than public schools’.

Actually I scraped 138 private schools from niche website and more from the OSPI website.

But there are just 57 private schools on the map.

The reason is that many private shcools’ ZIP code were not matched in the ZIP-CBSA list( Public schools worked well ).

For FindHome’s next version,the first thing I need to do is making a better ZIP-CBSA mapping list.

Ok, we can make a area selection for public schools and for private schools.

Selection 1:Best Areas for Schools

Employment and Annual Wage

On this map,we can see A_Seattle and A_Portland have distinctive more employment than other areas.They both have more than one million employment.Clicking on each area we can get a list for 22 occupation categories including employment and mean annual wage.

Here we use 22 distinctive colors to mark each occupation’s employment by it’s occupation category. A_Seattle and A_Portland obviously have more employment.

Now we have second selection.

Selection 2: Areas for Most Employment in 2017

Among those categories, we can see six occupation categories which have most employment/occupation :

Note: Computer_Mathematical, Transportation_MaterialMoving are quite stronger in A_Seattle than other areas.

A_Portland has more occupations than A_Seattle! I’ve never known there is a area as big as great Seattle Metropolitan in Washington State before I moved in. This is a good news for me.

Oh, Healthcare_Practitioner_Technical has so many high wage occupations!

Next is Management.

Transportation_MaterialMoving, Computer_Mathematical and Legal have some high wage positions.

We can have a set for most high wage occupation categories:

Till now,we know there are two areas which have most occupations and employment in Washington State.

How about comparing those two areas during the past ten years ? Let’s do it.

A_Seattle VS A_Portland ( 2008-2017 )

For Mean Annual Wage during past ten years, Management, Computer_Mathematical, Architechture, Business_Financial have beautiful going-up line shapes.

Healthcare_Practitioner_Technical and Legal are also strong.

Here we use employment sum as line size,more thicker more employment.

For this combined measurement, Management, Computer_Mathmatical, Healthcare_Practitioner_Technical and Business_Financial have most strong line shapes.

We notice that there is a sudden change on lines’ size in A_Portland in 2010. Why?

Let’s see employment plot.

The plot is for employment summarized by Occupation Categories.

There is a obvious jumping on employment for every occupation categories in A_Portland in 2010!

After checking description in original data files,I found this time-line:

Year Area_Name Include_Counties
2008 Vancouver MSA Includes Clark and Skamania Counties
2009 vancouver Includes Clark and Skamania Counties
2010 Portland Includes Clackamas, Columbia, Multnoma,Washington and Yamhill, OR; and Clark and Skamania, WA Counties
2011 Portland Includes Clackamas, Columbia, Multnoma,Washington and Yamhill, OR; and Clark and Skamania, WA Counties
2012 Portland-Vancouver MSA Includes Clackamas, Columbia, Multnoma,Washington and Yamhill, OR; and Clark and Skamania, WA Counties
2013 Portland-Vancouver MSA Includes Clackamas, Columbia, Multnoma,Washington and Yamhill, OR; and Clark and Skamania, WA Counties
2014 Portland-Vancouver MSA Includes Clackamas, Columbia, Multnoma,Washington and Yamhill, OR; and Clark and Skamania, WA Counties
2015 Port-Vancouver-Hillsboro MSA Includes Clackamas, Columbia, Multnoma,Washington and Yamhill, OR; and Clark and Skamania, WA Counties
2016 Port-Vancouver-Hillsboro MSA Includes Clark and Skamania (WA); Clackamas,Columbia, Multnoma, Washington and Yamhill (OR) counties
2017 Vancouver Portland-Hillsboro MSA Includes Clark and Skamania (WA); Clackamas,Columbia, Multnoma, Washington and Yamhill (OR) counties

People combined five counties in Oregon State ( Clackamas, Columbia, Multnoma,Washington and Yamhill), with two counties in in Washington State(Clark and Skamania). Why they did this? By this combine,can people really work and live as similar as in A_Seattle? To answer those questions, it seems that I need doing Data_Word_Science.

Let’s put those questions aside and keep moving forward.

We notice that Computer_Mathematical and Business_Financial in A_Seattle are quite more stronger than which in A_Portland.

Considering employment and mean annual wage together, we can have third selection!

We can use intersection of employment and mean annual wage to have a simple selection:

Selection 3: Most Strong Occupation Categories in A_Seattle and A_Portland

Next let’s compare occupations between A_Seattle and A_Portland.

From this plot,we can see nearly below the line of 40k annual wage, A_Portland has more occupations than those in A_Seattle.But above the line,A_Seattle has more occupations.

Oh,What a high mountain! Healthcare_Practitioner_Technical has a quite big gap from lowest wage to highest wage!

Here we make six groups for annual wage from zero to 300K and see how the distribution will be.

Above 150K annual wage,Healthcare_Practitioner_Technical has most dots in both areas.

Managements in A_Seattle has more dots than that in A_Portland.

Computer_Mathematical just has one dot in each area.

From 50K to 150K, A_Seattle has quite more employment,especially on Computer_Mathematical category.

Next let’s see how annual wage and employment changed during past ten years in those two areas.

My goodness, Management is a real golden category! People in it don’t have enough time to enjoy blue sky! Employers in A_Portland paid more to Computer_Mathematical guys in 2011. Their fellows in A_Seattle did same thing in 2015,but they paid less to Business_Financial people in the same year.

For employment, Office_Administrative is No.1 contributor in both two areas. Sales,Transportaion_MaterialMoving and Food_Preparation_Serving are also strong. In Computer_Mathematical and Business_Financial,employment became stronger and stronger during recent five years in A_Seattle.

Here we put mean, min and max values together for the past five years. Healthcare_Practitioner_Technical has a jump on highest wage in 2017 in A_Portland. This kind of jump happened in the same year in A_Seattle which is on Transportation_MaterialMoving.

Such a big Tetris piece of Computer_Mathematical in A_Seattle! It’s really a big feature of this area!

Ok, we have done all of plots finally.Let’s see what we get.We have three selection as below:

Selection 1:Best Areas for Schools

Selection 2: Areas for Most Employment in 2017

Selection 3: Most Strong Occupation Categories in A_Seattle and A_Portland

From Selection 1 and Selection 2, we can easily pick up A_Seattle.

It seems like a best area for public schools,private schools and employment.

But looking for a job need to think over more about offering opportunity and wage level.

If someone can have a good-payment job,then private boarding school is also a good choice.

Let’s see Selection 3: Most Strong Occupation Categories in A_Seattle and A_Portland.

Here we find that Computer_Mathematical and Businees_Financial can have pretty good opportunities in A_Seattle, and Management is the best in A_Portland.

How about me ? I used to be a computer software engineer , and I just find I have a big interest on being a Data Scientist. Then I should be in the Computer_Mathematical category.

The answer for me is : Seattle_Bellevue_Evereet area !

Continuous Thinking

This analysis doesn’t consider consuming expense,real estate market and living services.

So it’s just a begining version,like FindHome_V_0.5.

And this is just my view dimension.

How about another person such as a professional on grape wine brewing?

He or She doesn’t need a most active commercial area,but a good Vineyard.For this Walla Walla area is a right choice.

If this analysis has a good will to serve one million people,then the version number should be FindHome_V_5.0e-7.